28 research outputs found
Protest-er:Retraining bert for protest event extraction
We analyze the effect of further retraining BERT with different domain specific data as an unsupervised domain adaptation strategy for event extraction. Portability of event extraction models is particularly challenging, with large performance drops affecting data on the same text genres (eg, news). We present PROTEST-ER, a retrained BERT model for protest event extraction. PROTEST-ER outperforms a corresponding generic BERT on out-of-domain data of 8.1 points. Our best performing models reach 51.91-46.39 F1 across both domains
Automated Extraction of Socio-political Events from News (AESPEN): Workshop and Shared Task Report
We describe our effort on automated extraction of socio-political events from
news in the scope of a workshop and a shared task we organized at Language
Resources and Evaluation Conference (LREC 2020). We believe the event
extraction studies in computational linguistics and social and political
sciences should further support each other in order to enable large scale
socio-political event information collection across sources, countries, and
languages. The event consists of regular research papers and a shared task,
which is about event sentence coreference identification (ESCI), tracks. All
submissions were reviewed by five members of the program committee. The
workshop attracted research papers related to evaluation of machine learning
methodologies, language resources, material conflict forecasting, and a shared
task participation report in the scope of socio-political event information
collection. It has shown us the volume and variety of both the data sources and
event information collection approaches related to socio-political events and
the need to fill the gap between automated text processing techniques and
requirements of social and political sciences
Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2022): Workshop and Shared Task Report
We provide a summary of the fifth edition of the CASE workshop that is held
in the scope of EMNLP 2022. The workshop consists of regular papers, two
keynotes, working papers of shared task participants, and task overview papers.
This workshop has been bringing together all aspects of event information
collection across technical and social science fields. In addition to the
progress in depth, the submission and acceptance of multimodal approaches show
the widening of this interdisciplinary research topic.Comment: to appear at CASE 2022 @ EMNLP 202
Multimodal and Multilingual Understanding of Smells using VilBERT and mUNITER
We evaluate state-of-the-art multimodal models to detect common olfactory references in multilingual text and images in the scope of the Multimodal Understanding of Smells in Texts and Images (MUSTI) at Mediaeval’22. The goal of the MUSTI Subtask 1 is to classify paired text and images as to whether they refer to the same smell source or not. We approach this task as a Visual Entailment problem and evaluate the performance of the English model ViLBERT and the multilingual model mUNITER on MUSTI Subtask 1. Although base VilBERT and mUNITER models perform worse than a dummy baseline, fine-tuning these models improve performance significantly in almost all scenarios. We find that fine-tuning mUNITER with SNLI-VE and MUSTI train data performs better than other configurations we implemented. Our experiments demonstrate that the task presents some challenges, but it is by no means impossible. Our code is available on https://github. com/Odeuropa/musti-eval-baselines
Event Causality Identification with Causal News Corpus -- Shared Task 3, CASE 2022
The Event Causality Identification Shared Task of CASE 2022 involved two
subtasks working on the Causal News Corpus. Subtask 1 required participants to
predict if a sentence contains a causal relation or not. This is a supervised
binary classification task. Subtask 2 required participants to identify the
Cause, Effect and Signal spans per causal sentence. This could be seen as a
supervised sequence labeling task. For both subtasks, participants uploaded
their predictions for a held-out test set, and ranking was done based on binary
F1 and macro F1 scores for Subtask 1 and 2, respectively. This paper summarizes
the work of the 17 teams that submitted their results to our competition and 12
system description papers that were received. The best F1 scores achieved for
Subtask 1 and 2 were 86.19% and 54.15%, respectively. All the top-performing
approaches involved pre-trained language models fine-tuned to the targeted
task. We further discuss these approaches and analyze errors across
participants' systems in this paper.Comment: Accepted to the 5th Workshop on Challenges and Applications of
Automated Extraction of Socio-political Events from Text (CASE 2022